πŸ•ΈοΈ Ada Research Browser

2026-03-07.md
← Back

Skilled Deep Research β€” Improvement: Site Crawling

Issue identified: Current skill fetches individual URLs found via search β€” no site traversal. If a site has multiple subpages with relevant content (e.g., cmmcaudit.org has separate pages for SSP templates, policy templates, assessment tools), workers only find what search engines surface. Deep links, pagination, and resource indexes are missed.

What's needed: A crawl mode where workers can: 1. Fetch a root URL and extract all internal links 2. Score/filter those links for relevance (e.g., regex on anchor text: "template", "download", "docx", "SSP", "POA&M") 3. Enqueue high-relevance links for fetching 4. Respect depth limit (e.g., max 2 levels) and domain boundary

Suggested implementation: - Add a crawl helper script to scripts/ that wraps the fetch script + link extraction (python3 + BeautifulSoup or regex) - Worker prompt gets a new instruction block: "If a page appears to be a resource index (templates, downloads, tools), extract internal links and crawl up to 2 levels deep" - Could also be a dedicated crawler worker type spawned alongside domain workers

Sean's comment: "I don't think any deep research skill is complete without that." Priority: High β€” this is a fundamental capability gap.